covered 2.6 million Mexican households in 50,000 villages by 2001,
renamed “Oportunidades” in 2002.
PROGRESA
Led to similar programs in:
Bangladesh, Brazil, Cambodia, Chile, Colombia, Egypt, Guatemala, Honduras, Indonesia, Jamaica, Nicaragua, Panama, Peru, Phillipines, Turkey and the United States.
Explore the data with R functions such as names, dim, head, table and summary.
summary(df)
wave sooloca indivill_seq villsize_t
Min. :1.00 Min. : 1 Min. : 1.00 Min. : 1.00
1st Qu.:2.00 1st Qu.:131 1st Qu.: 8.00 1st Qu.: 26.00
Median :3.00 Median :253 Median : 18.00 Median : 39.00
Mean :3.06 Mean :254 Mean : 25.63 Mean : 50.26
3rd Qu.:4.00 3rd Qu.:395 3rd Qu.: 34.00 3rd Qu.: 63.00
Max. :5.00 Max. :491 Max. :210.00 Max. :210.00
progresa1 sooind_id sex1 age1
Min. :0.0000 Min. : 1 Min. :0.000 Min. : 0.00
1st Qu.:0.0000 1st Qu.: 3983 1st Qu.:0.000 1st Qu.: 9.00
Median :1.0000 Median : 7894 Median :0.000 Median :11.00
Mean :0.6255 Mean : 7877 Mean :0.485 Mean :11.33
3rd Qu.:1.0000 3rd Qu.:11768 3rd Qu.:1.000 3rd Qu.:14.00
Max. :1.0000 Max. :15669 Max. :1.000 Max. :99.00
hgc1 poor1 school
Min. :0.000 Min. :1 Min. :0.0000
1st Qu.:2.000 1st Qu.:1 1st Qu.:1.0000
Median :4.000 Median :1 Median :1.0000
Mean :4.029 Mean :1 Mean :0.8275
3rd Qu.:6.000 3rd Qu.:1 3rd Qu.:1.0000
Max. :9.000 Max. :1 Max. :1.0000
We could try to impute them, for example, impute that the previous child is age \(7\) in waves \(3\) and \(4\) based on his ages in waves \(1\) and \(5\).
Instead, we will set to missing.
Number of Observations
Data has 74031 rows (child-wave observations).
nrow(df)
[1] 74031
We do not have 74031 independent observations.
How many child-observations are in the data?
Number of Observations
Data has 74031 rows (child-wave observations).
How many child-observations are in the data?
Child id is sooind_id.
Use length and unique functions to find that there are 15669 child-observations in the data, of which 9799 are treated and 5870 are control.
length(unique(df$sooind_id))
[1] 15669
length(unique(df[df$progresa1==1,"sooind_id"]))
[1] 9799
length(unique(df[df$progresa1==0,"sooind_id"]))
[1] 5870
Number of Observations
Data has 74031 rows (child-wave observations).
There are 15669 child-observations in the data, of which 9799 are treated and 5870 are control.
Child within the same village (and certainly in same family) may not be independent observations.
How many villages?
Number of Observations
Data has 74031 rows (child-wave observations).
There are 15669 child-observations in the data, of which 9799 are treated and 5870 are control.
How many villages?
Village id is sooloca.
Use length and unique functions to find that there are 491 villages, of which 308 are treated and 183 are control.
length(unique(df$sooloca))
[1] 491
length(unique(df[df$progresa1==1,"sooloca"]))
[1] 308
length(unique(df[df$progresa1==0,"sooloca"]))
[1] 183
Preparing data for analysis
Change name of variables for convenience:
progresa1 to treat,
sex1 to girl .
Using the subset function, restrict data to first wave (first baseline wave) and fourth wave (second post-treatment wave) and keep only relevant variables.
How many observations do we lose for balanced sample?
In full analysis, should redo exploratory analysis on balanced sample, compare to full sample.
nrow(df[df$wave==1,])
[1] 14996
nrow(df[df$wave==4,])
[1] 15610
nrow(df2[df2$wave==1,])
[1] 14996
nrow(df2[df2$wave==4,])
[1] 14996
Preparing data for analysis
Using the ifelse command, create an indicator variable for whether an observation is from the post period, i.e., a variable that equals 1 if the observation is from waves 4.
Create separate data frames for pre and post treatment observations.
To what degree are these results real, versus driven by sampling noise?
Hypothesis testing!
References
Attanasio, Orazio, Emla Fitzsimons, Ana Gomez, Martha Isabel Gutierrez, Costas Meghir, and Alice Mesnard. 2010. “Children’s Schooling and Work in the Presence of a Conditional Cash Transfer Program in Rural Colombia.”Economic Development and Cultural Change 58 (2): 181–210.
Bruhn, Miriam, and David McKenzie. 2009. “In Pursuit of Balance: Randomization in Practice in Development Field Experiments.”American Economic Journal: Applied Economics 1 (4): 200–232. https://www.jstor.org/stable/25760187.
Grolemund, Garrett. 2014. Hands-on Programming with r: Write Your Own Functions and Simulations. " O’Reilly Media, Inc.".
Hlavac, Marek. 2022. “Stargazer: LaTeX Code and ASCII Text for Well-Formatted Regression and Summary Statistics Tables” R package version 5.2.3. https://CRAN.R-project.org/package=stargazer.
(IFPRI), International Food Policy Research Institute. 2018. “Mexico, Evaluation of PROGRESA.” Harvard Dataverse. https://doi.org/10.7910/DVN/05BMJY.
Lee, Soohyung, and Azeem M Shaikh. 2014. “Multiple Testing and Heterogeneous Treatment Effects: Re-Evaluating the Effect of Progresa on School Enrollment.”Journal of Applied Econometrics 29 (4): 612–26. https://home.uchicago.edu/~amshaikh/webfiles/progresa.pdf.
Schultz, T Paul. 2004. “School Subsidies for the Poor: Evaluating the Mexican Progresa Poverty Program.”Journal of Development Economics 74 (1): 199–250.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Skoufias, Emmanuel, Susan W Parker, Jere R Behrman, and Carola Pessino. 2001. “Conditional Cash Transfers and Their Impact on Child Work and Schooling: Evidence from the Progresa Program in Mexico [with Comments].”Economia 2 (1): 45–96. https://www.jstor.org/stable/20065413.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.